AITopics | topic detection

Collaborating Authors

topic detection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Topic-aware Large Language Models for Summarizing the Lived Healthcare Experiences Described in Health Stories

Bilalpur, Maneesh, Hamm, Megan, Lee, Young Ji, Norman, Natasha, McTigue, Kathleen M., Wang, Yanshan

arXiv.org Artificial IntelligenceOct-30-2025

Storytelling is a powerful form of communication and may provide insights into factors contributing to gaps in healthcare outcomes. To determine whether Large Language Models (LLMs) can identify potential underlying factors and avenues for intervention, we performed topic-aware hierarchical summarization of narratives from African American (AA) storytellers. Fifty transcribed stories of AA experiences were used to identify topics in their experience using the Latent Dirichlet Allocation (LDA) technique. Stories about a given topic were summarized using an open-source LLM-based hierarchical summarization approach. Topic summaries were generated by summarizing across story summaries for each story that addressed a given topic. Generated topic summaries were rated for fabrication, accuracy, comprehensiveness, and usefulness by the GPT4 model, and the model's reliability was validated against the original story summaries by two domain experts. 26 topics were identified in the fifty AA stories. The GPT4 ratings suggest that topic summaries were free from fabrication, highly accurate, comprehensive, and useful. The reliability of GPT ratings compared to expert assessments showed moderate to high agreement. Our approach identified AA experience-relevant topics such as health behaviors, interactions with medical team members, caregiving and symptom management, among others. Such insights could help researchers identify potential factors and interventions by learning from unstructured narratives in an efficient manner-leveraging the communicative power of storytelling. The use of LDA and LLMs to identify and summarize the experience of AA individuals suggests a variety of possible avenues for health research and possible clinical improvements to support patients and caregivers, thereby ultimately improving health outcomes.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.24765

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bundle Fragments into a Whole: Mining More Complete Clusters via Submodular Selection of Interesting webpages for Web Topic Detection

Pang, Junbiao, Hu, Anjing, Huang, Qingming

arXiv.org Artificial IntelligenceSep-18-2024

Organizing interesting webpages into hot topics is one of key steps to understand the trends of multimodal web data. A state-of-the-art solution is firstly to organize webpages into a large volume of multi-granularity topic candidates; hot topics are further identified by estimating their interestingness. However, these topic candidates contain a large number of fragments of hot topics due to both the inefficient feature representations and the unsupervised topic generation. This paper proposes a bundling-refining approach to mine more complete hot topics from fragments. Concretely, the bundling step organizes the fragment topics into coarse topics; next, the refining step proposes a submodular-based method to refine coarse topics in a scalable approach. The propose unconventional method is simple, yet powerful by leveraging submodular optimization, our approach outperforms the traditional ranking methods which involve the careful design and complex steps. Extensive experiments demonstrate that the proposed approach surpasses the state-of-the-art method (i.e., latent Poisson deconvolution Pang et al. (2016)) 20% accuracy and 10% one on two public data sets, respectively.

hot topic, interestingness, webpage, (14 more...)

arXiv.org Artificial Intelligence

2409.1238

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

A comprehensive study on Frequent Pattern Mining and Clustering categories for topic detection in Persian text stream

Zafarani-Moattar, Elnaz, Kangavari, Mohammad Reza, Rahmani, Amir Masoud

arXiv.org Artificial IntelligenceMar-15-2024

Topic detection is a complex process and depends on language because it somehow needs to analyze text. There have been few studies on topic detection in Persian, and the existing algorithms are not remarkable. Therefore, we aimed to study topic detection in Persian. The objectives of this study are: 1) to conduct an extensive study on the best algorithms for topic detection, 2) to identify necessary adaptations to make these algorithms suitable for the Persian language, and 3) to evaluate their performance on Persian social network texts. To achieve these objectives, we have formulated two research questions: First, considering the lack of research in Persian, what modifications should be made to existing frameworks, especially those developed in English, to make them compatible with Persian? Second, how do these algorithms perform, and which one is superior? There are various topic detection methods that can be categorized into different categories. Frequent pattern and clustering are selected for this research, and a hybrid of both is proposed as a new category. Then, ten methods from these three categories are selected. All of them are re-implemented from scratch, changed, and adapted with Persian. These ten methods encompass different types of topic detection methods and have shown good performance in English. The text of Persian social network posts is used as the dataset. Additionally, a new multiclass evaluation criterion, called FS, is used in this paper for the first time in the field of topic detection. Approximately 1.4 billion tokens are processed during experiments. The results indicate that if we are searching for keyword-topics that are easily understandable by humans, the hybrid category is better. However, if the aim is to cluster posts for further analysis, the frequent pattern category is more suitable.

category, detection, topic detection, (15 more...)

arXiv.org Artificial Intelligence

2403.10237

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Asia > Taiwan (0.04)
Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.87)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

Comparing Human-Centered Language Modeling: Is it Better to Model Groups, Individual Traits, or Both?

Soni, Nikita, Balasubramanian, Niranjan, Schwartz, H. Andrew, Hovy, Dirk

arXiv.org Artificial IntelligenceJan-23-2024

Natural language processing has made progress in incorporating human context into its models, but whether it is more effective to use group-wise attributes (e.g., over-45-year-olds) or model individuals remains open. Group attributes are technically easier but coarse: not all 45-year-olds write the same way. In contrast, modeling individuals captures the complexity of each person's identity. It allows for a more personalized representation, but we may have to model an infinite number of users and require data that may be impossible to get. We compare modeling human context via group attributes, individual users, and combined approaches. Combining group and individual features significantly benefits user-level regression tasks like age estimation or personality assessment from a user's documents. Modeling individual users significantly improves the performance of single document-level classification tasks like stance and topic detection. We also find that individual-user modeling does well even without user's historical data.

computational linguistic, human context, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2401.12492

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Maryland > Baltimore (0.04)
(12 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.54)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)
Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.48)

Add feedback

Multi-turn Dialogue Comprehension from a Topic-aware Perspective

Ma, Xinbei, Xu, Yi, Zhao, Hai, Zhang, Zhuosheng

arXiv.org Artificial IntelligenceSep-18-2023

Dialogue related Machine Reading Comprehension requires language models to effectively decouple and model multi-turn dialogue passages. As a dialogue development goes after the intentions of participants, its topic may not keep constant through the whole passage. Hence, it is non-trivial to detect and leverage the topic shift in dialogue modeling. Topic modeling, although has been widely studied in plain text, deserves far more utilization in dialogue reading comprehension. This paper proposes to model multi-turn dialogues from a topic-aware perspective. We start with a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way. Then we use these fragments as topic-aware language processing units in further dialogue comprehension. On one hand, the split segments indict specific topics rather than mixed intentions, thus showing convenient on in-domain topic detection and location. For this task, we design a clustering system with a self-training auto-encoder, and we build two constructed datasets for evaluation. On the other hand, the split segments are an appropriate element of multi-turn dialogue response selection. For this purpose, we further present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements and matches response candidates with a dual cross-attention. Empirical studies on three public benchmarks show great improvements over baselines. Our work continues the previous studies on document topic, and brings the dialogue modeling to a novel topic-aware perspective with exhaustive experiments and analyses.

dataset, dialogue, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2309.09666

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Shanghai > Shanghai (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Persian topic detection based on Human Word association and graph embedding

Ranjbar-Khadivi, Mehrdad, Akbarpour, Shahin, Feizi-Derakhshi, Mohammad-Reza, Anari, Babak

arXiv.org Artificial IntelligenceJul-18-2023

In this paper, we propose a framework to detect topics in social media based on Human Word Association. Identifying topics discussed in these media has become a critical and significant challenge. Most of the work done in this area is in English, but much has been done in the Persian language, especially microblogs written in Persian. Also, the existing works focused more on exploring frequent patterns or semantic relationships and ignored the structural methods of language. In this paper, a topic detection framework using HWA, a method for Human Word Association, is proposed. This method uses the concept of imitation of mental ability for word association. This method also calculates the Associative Gravity Force that shows how words are related. Using this parameter, a graph can be generated. The topics can be extracted by embedding this graph and using clustering methods. This approach has been applied to a Persian language dataset collected from Telegram. Several experimental studies have been performed to evaluate the proposed framework's performance. Experimental results show that this approach works better than other topic detection methods.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.09775

Country:

North America > United States > New York > New York County > New York City (0.14)
Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.04)
Asia > Azerbaijan (0.04)
(3 more...)

Genre: Research Report > New Finding (0.54)

Industry: Information Technology (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

A Human Word Association based model for topic detection in social networks

Khadivi, Mehrdad Ranjbar, Akbarpour, Shahin, Feizi-Derakhshi, Mohammad-Reza, Anari, Babak

arXiv.org Artificial IntelligenceJul-18-2023

With the widespread use of social networks, detecting the topics discussed in these networks has become a significant challenge. The current works are mainly based on frequent pattern mining or semantic relations, and the language structure is not considered. The meaning of language structural methods is to discover the relationship between words and how humans understand them. Therefore, this paper uses the Concept of the Imitation of the Mental Ability of Word Association to propose a topic detection framework in social networks. This framework is based on the Human Word Association method. A special extraction algorithm has also been designed for this purpose. The performance of this method is evaluated on the FA-CUP dataset. It is a benchmark dataset in the field of topic detection. The results show that the proposed method is a good improvement compared to other methods, based on the Topic-recall and the keyword F1 measure. Also, most of the previous works in the field of topic detection are limited to the English language, and the Persian language, especially microblogs written in this language, is considered a low-resource language. Therefore, a data set of Telegram posts in the Farsi language has been collected. Applying the proposed method to this dataset also shows that this method works better than other topic detection methods.

information retrieval, machine learning, pattern recognition, (21 more...)

arXiv.org Artificial Intelligence

2301.13066

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.04)
Asia > Azerbaijan (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Services (1.00)
Government > Regional Government > North America Government > United States Government (0.46)
Leisure & Entertainment > Sports > Soccer (0.35)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

A modified model for topic detection from a corpus and a new metric evaluating the understandability of topics

Kitano, Tomoya, Miyatake, Yuto, Furihata, Daisuke

arXiv.org Artificial IntelligenceJun-8-2023

This paper presents a modified neural model for topic detection from a corpus and proposes a new metric to evaluate the detected topics. The new model builds upon the embedded topic model incorporating some modifications such as document clustering. Numerical experiments suggest that the new model performs favourably regardless of the document's length. The new metric, which can be computed more efficiently than widely-used metrics such as topic coherence, provides variable information regarding the understandability of the detected topics.

artificial intelligence, information retrieval, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.04941

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.06)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.63)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.39)

Add feedback

Using meaning instead of words to track topics

Poumay, Judicael, Ittoo, Ashwin

arXiv.org Artificial IntelligenceJan-2-2023

The ability to monitor the evolution of topics over time is extremely valuable for businesses. Currently, all existing topic tracking methods use lexical information by matching word usage. However, no studies has ever experimented with the use of semantic information for tracking topics. Hence, we explore a novel semantic-based method using word embeddings. Our results show that a semantic-based approach to topic tracking is on par with the lexical approach but makes different mistakes. This suggest that both methods may complement each other.

artificial intelligence, information, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-08473-7_42

2301.00565

Country:

Europe > Spain > Galicia > Madrid (0.04)
Europe > Belgium > Wallonia > Liège Province > Liège (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.54)

Industry:

Media > News (0.48)
Information Technology > Security & Privacy (0.30)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.35)

Add feedback

Topic Detection in Continuous Sign Language Videos

Budria, Alvaro, Tarres, Laia, Gallego, Gerard I., Moreno-Noguer, Francesc, Torres, Jordi, Giro-i-Nieto, Xavier

arXiv.org Artificial IntelligenceSep-1-2022

Significant progress has been made recently on challenging tasks in automatic sign language understanding, such as sign language recognition, translation and production. However, these works have focused on datasets with relatively few samples, short recordings and limited vocabulary and signing space. In this work, we introduce the novel task of sign language topic detection. We base our experiments on How2Sign, a large-scale video dataset spanning multiple semantic domains. We provide strong baselines for the task of topic detection and present a comparison between different visual features commonly used in the domain of sign language.

computer vision, recognition, video, (12 more...)

arXiv.org Artificial Intelligence

2209.02402

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Berlin (0.04)
Europe > Belgium (0.04)

Genre: Research Report (0.50)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback